k-nearest Neighbor Classification on Spatial Data Streams Using P-trees
نویسندگان
چکیده
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. For that reason KNN is called a lazy classifier. KNN is extremely simple to implement and lends itself to a wide variety of variations. The traditional k-nearest neighbor classifier finds the k nearest neighbors based on some distance metric by finding the distance of the target data point from the training dataset, then finding the class from those nearest neighbors by some voting mechanism. There is a problem associated with KNN classifiers. They increase the classification time significantly relative to other non-lazy methods. To overcome this problem, in this paper we propose a new method of KNN classification for spatial data streams using a new, rich, data-mining-ready structure, the Peano-count-tree or Ptree. In our method, we merely perform some logical AND/OR operations on P-trees to find the nearest neighbor set of a new sample and assign the class label. We have fast and efficient algorithms for AND/OR operations on P-trees, which reduce the classification time significantly, compared with traditional KNN classifiers. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification accuracy as well as significantly higher speed.
منابع مشابه
K-Nearest Neighbor Classification on Spatial Data Streams
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a ...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملk-Nearest Neighbor Classification on Spatial Data
Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variatio...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کامل